YouTube Link: https://youtu.be/LKBlnTX6nj0

1 Introduction

Nowadays, the rising internet-supported peer-to-peer platforms, such as Airbnb, have raised the interests of researchers and the alerts of governors due to their impacts to traditional real estate markets(Einav, Farronato, & Levin, 2016). On the demand side, assuming that tourists are different from residents as they only require short-term accommodation rather than long-term rentals or mortgages. As for supply, the short-term rentals (intended to meet tourist demand) are more profitable than long-term rentals (for residents), causing property owners to redirect their supply towards the tourist market, eventually leading to the raising rental prices and sometimes displacing residents from city centers to the outskirts (Beatriz & Iis, 2020).

In this project, we will use the Airbnb data set of Amsterdam to predict apartment prices and available days based on apartment characteristics and other data at the neighborhood level. Based on the prediction results of the OLS models, compare the short-term rental income of Airbnb and the long-term rental income of real estate.

a. Stakeholders

Obviously, hosts who need some data to support their decision-making on whether to put their real estates to Airbnb as short-term rentals or to the local rental market as long-term rentals are the direct beneficiaries of this prediction. For some professional real estate agents or housing managers, they will also get data support from this prediction result. They will recognize which areas are more worthy of investment, or which real estate market is more suitable for income, so that they can more rationally allocate the properties they manage and increase their income. The Airbnb platform should also be interested in this prediction result if they tend to attract more hosts, and of course they may also do similar data analysis internally. If these prediction models can provide solid evidence that Airbnb rentals are more profitable than the local long-term rentals, Airbnb would be happy to present these results to attract more hosts, at least on their hosts‘ webpage.

b. Use case

There are some websites that provide Airbnb price information to help hosts with decision-making support, but there is no website that compares Airbnb rents with local long-term rental market prices. Moreover, our model also provides a prediction of the number of days available for rent within a month. Although the reliability of this prediction is yet to be studied due to the problem of the data set, it provides another way of thinking to a certain extent.

2 Methods

a. Neighborhood as A Study Unit

Amsterdam is broken up into 8 districts or boroughs (Centrum, Zuid, West, Oost, Noord, Nieuw-West, Zuidoost, Westpoort), which are further divided into neighborhoods.

Our final goal is to compare the short-term and long-term rental prices. We obtain a data set with the average rent prices of district level as the long-term rental prices, as for the short-term rental price, we will multiply the price per day and the number of days available per month.

b. Machine Learning Process

i. Data Wrangling

Our first step is to compile the data that we need into one dataset. This will include the two dependent variables, prices and available days in 30 days of each Airbnb apartment in Amsterdam, and the features needed to predict these two outcomes.

ii. Exploratory Analysis

We will investigate both the underlying spatial process in the outcome of interest as well as trends and correlations between the outcome and the predictive features by scatter plots.

iii. Feature Engineering

We will conduct feature engineering work in three aspects - 1) reclassify some variables from listing Airbnb data; 2) measure exposure distance to public services/(dis)amenities 3) test analysis to check whether the reviews of an apartment contain certain words or not.

iv. Feature Selection

Although we have nearly a hundred features to choose from in this project, we will limit the number of features actually used by the model within 20.

v. Prediction Model

We will conduct prices and available days prediction in two OLS models respectively.

c. Prediction Model

House price prediction has been a common use case in cities that use data to assess property taxes. The hedonic model is a theoretical framework for predicting home prices by breaking down house prices into the value of their constituent parts, such as the presence of a pool or the amount of local crime.

For our purpose, Airbnb apartment prices and available days can be deconstructed into three constituent parts - 1) physical characteristics, like whether the apartment provide Wi-Fi, TV or not; 2) public services/(dis)amenities, such as the distance to transit stations, the distance to historical architectures; 3) reviews text analysis, whether some specific words are included in the listing’s review text. However, we omit the spatial process of our dependent variables, namely how dependent variables cluster at the neighborhood, districts and city scales, when developing regression models.

3 Data Wrangling

a. Airbnb Data

We obtain data from Airbnb listings in Amsterdam from Dec 05 2021 to Sep 07 2022 conducted by InsideAirbnb.com, who obtain and provide data to the public for research purposes. Quarterly Airbnb data ( Dec 05 2021, Mar 08 2022, Jun 05 2022, Sep 07 2022 ) for the last year will be used in our research.

ggplot() + geom_sf(data = neighbourhoods_geo, fill = "grey40") +
  stat_density2d(data = data.frame(st_coordinates(airbnb_geo)), 
                 aes(X, Y, fill = ..level.., alpha = ..level..),
                 size = 0.01, bins = 40, geom = 'polygon') +
  scale_fill_gradient(low = "#25CB10", high = "#FA7800", 
                      breaks=c(0.000000003,0.00000003),
                      labels=c("Minimum","Maximum"), name = "Density") +
  scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
  labs(title = "Density of Short Term Housing, Amsterdam"  , subtitle = "Map 3-1") +
  mapTheme()

Airbnb listing Density map shows that there were alot of short-term housing located in the center of Amsterdam, namely, the West, Centrum, and Zuid districts.

b. Other Amsterdam Data

We also download external data from Maps Data City Amsterdam (https://maps.amsterdam.nl/open_geodata/?LANG=en), who provides official construction or social data of city Amsterdam. We obtain housing stock, mean rent price and population data from this website.

i. Housing Stock Bar Plot

new_neighbor <- neighbourhoods_geo

housing_stock <- c(7432,7432,8208,4763,544, 8208,751,7432, 1831, 574, NA, 979, 3465, 1056,8298,1904,
NA, NA, 1869, 2113, 294, 5192
)

new_neighbor <- new_neighbor %>%
  cbind(new_neighbor, housing_stock)%>%
  dplyr::select(neighbourhood,housing_stock, geometry)
#new_neighbor 





ggplot(new_neighbor, aes(x=neighbourhood, y=housing_stock, fill=neighbourhood))+
geom_bar(stat="identity", color="black")+
scale_fill_manual(values=palette22)+
  theme(text = element_text(size = 5),element_line(size =0.5))+
  labs(title = "Long Term Housing Stock in Each Neighbourhood",  subtitle = "Graph 3-2")

ii. Long-term Mean Rent Price Bar Plot

new_neighborP <- neighbourhoods_geo

housing_price <- c(566,566,529,574,NA, 529,NA,566, 566, NA, NA, NA, 538, NA,560,NA,
529, NA, NA,NA, NA, NA
)

new_neighborP <- new_neighborP %>%
  cbind(new_neighborP, housing_price)%>%
  dplyr::select(neighbourhood,housing_price, geometry)
#new_neighbor 





ggplot(new_neighbor, aes(x=neighbourhood, y=housing_price, fill=neighbourhood))+
geom_bar(stat="identity", color="black")+
scale_fill_manual(values=c("#00988e", "#008f8c", "#008689", "#007d85", "#077480", "#146b79", "#1d6272", "#23596a", "#275061", "#2a4858"))+
  theme(text = element_text(size = 5),element_line(size =0.5))+
  labs(title = "Long Term Housing Rent in Each Neighbourhood" , subtitle = "Graph 3-3")

iii. Relationship Between Airbnb Listings and Transit Stations

colony_10km <-  st_buffer(trans_stops, 390)

ggplot() +
  geom_sf(data = neighbourhoods_geo) +
  geom_sf(data = airbnb_geo, color = '#A5ABC2')+
  geom_sf(data = colony_10km, color = 'red',fill=NA)+
  labs(title = "Airbnb Listings and Transit Stations" , subtitle = "Map 3-4") +
  mapTheme()

The transit stations map shows that most of the stations are located in central Amsterdam, based on this information, we assume that the distance of each Airbnb listings to the nearest station might play a huge role in terms of setting the price.

4. Data Exploratory Analysis

a. Correlation Matrix

#airbnb_geo[ c(15,25,28,29, 31, 32:40, 42:45, 47:49, 52:58, 61:65)] <- sapply(airbnb_geo[c(15,25,28,29, 31, 32:39, 42:45, 47:49, 52:58, 61:65)], as.numeric)

#airbnb_geo[ c(host_listings_count, host_total_listings_count,accommodates,bedrooms,beds, price, 32:39, 42:45, 47:49, 52:58, 61:65)] <- sapply(airbnb_geo[c(15,25,28,29, 31, 32:40, 42:45, 47:49, 52:58, 61:65)], as.numeric, na.rm = T)

airbnb_geo <- airbnb_geo %>% #nearest neighbor distance
  mutate(
    landmark_nn1 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(historical)), 1),
      landmark_nn2 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(historical)), 2), 
      landmark_nn3 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(historical)), 3), 
      landmark_nn4 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(historical)), 4), 
     landmark_nn5 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(historical)), 5))  
 

airbnb_geo <- airbnb_geo %>% #nearest neighbor distance
  mutate(
    trans_nn1 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(trans_stops)), 1),
      trans_nn2 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(trans_stops)), 2), 
      trans_nn3 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(trans_stops)), 3), 
      trans_nn4 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(trans_stops)), 4), 
     trans_nn5 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(trans_stops)), 5))  
 

#historical

numeric_listings <- airbnb_geo  %>%
  dplyr::select(-host_id,-id,-geometry)

numericVars <- 
  select_if(st_drop_geometry(numeric_listings), is.numeric) %>% na.omit() 
  

ggcorrplot( 
  round(cor(numericVars), 1), 
  p.mat = cor_pmat(numericVars),
  colors = c("#25CB10", "white", "#FA7800"),
  lab_size = 1,
  tl.cex = 5,
  type="lower",
  insig = c("pch", "blank"), pch = 1, pch.col = "black", pch.cex =1) +  
  labs(title = "Correlation across numeric variables")+
  theme(text = element_text(size = 5),element_line(size =0.5))

Based the correlation matrix, we observed that there are a lot of highly correlated variables in the data set, such as trans_nn, landmark_nn, availability_30, availability_60, etc. When we were building the regression models

b. Analyzing Associations With Independent Variables

ggplot(airbnb_geo, aes(x=number_of_reviews, y=price)) + 
  geom_point()+
  geom_smooth(method=lm, colour="red") +
  labs( title = 'Price as a Function of Number of Reviews',subtitle = sprintf('correlation = %s',round(cor(airbnb_geo$number_of_reviews, airbnb_geo$price), 2)),  caption = 'Figure 4-1')+
  plotTheme()

ggplot(airbnb_geo, aes(x=review_scores_value, y=price)) + 
  geom_point()+
  geom_smooth(method=lm, colour="red") +
  labs( title = 'Price as a Function of Review Scores',subtitle = sprintf('correlation = %s',round(cor(airbnb_geo$number_of_reviews, airbnb_geo$price), 2)),  caption = 'Figure 4-2')+
  plotTheme()

We were interested in the effects of the reviews on the listings price; however, both number of reviews, and review scores don’t show a strong correlation between our dependent variable (price).

5. Feature Engineering

a. Creating Dummy Variables

In this section, we created 6 dummy variables to help us improve the regression accuracy, these variables are wifi, tv, private, dryer, quite, and clean. The reason that the first four variables were picked is we thought that these are the most important features that people are looking for. We created these features by searching the listing description column to see whether these key words are mentioned, then gave a value of 1 if present, 0 otherwise. Similarly, quite and clean are the two most frequent words appeared in the description column.

b.Text Analysis of Reviews

Reviews of former customers may be important information when someone decides which apartment in Airbnb should be chosen. As for this reason, we believe that the reviews text might have correlation with apartments’ price in some instances.

review_scores <- airbnb %>%
  mutate_if(is.character,as.numeric)%>% 
  dplyr::select("id","review_scores_rating","review_scores_accuracy","review_scores_cleanliness","review_scores_checkin","review_scores_communication","review_scores_location", "review_scores_value") %>%   
  st_drop_geometry(.) %>%
  na.omit()

review_scores.id <-
  review_scores %>%
  group_by(id) %>%
  summarise_at(c("review_scores_rating", "review_scores_accuracy", "review_scores_cleanliness", "review_scores_checkin", "review_scores_communication", "review_scores_location", "review_scores_value"), mean, na.rm = TRUE)

id.nb <- 
  subset(airbnb_geo[c("id", "neighbourhood_cleansed", "geometry")])%>%
  group_by(id)

review_scores.nb <- 
  merge(id.nb, review_scores.id, by = "id")%>%
  group_by(neighbourhood_cleansed)%>%
  summarise_at(c("review_scores_rating", "review_scores_accuracy", "review_scores_cleanliness", "review_scores_checkin", "review_scores_communication", "review_scores_location", "review_scores_value"), mean, na.rm = TRUE)%>%
  st_drop_geometry()

review_scores.nb <- merge(x = review_scores.nb, y = nb, by.x = "neighbourhood_cleansed", by.y = "neighbourhood") %>%st_as_sf()
ggplot() +
#  geom_sf(data = nb, color = "#767E8E", fill = "transparent")

  geom_sf(data = review_scores.nb, aes(fill = q5(review_scores_rating)))+
  scale_fill_manual(values = palette5,
                    labels = qBr(review_scores.nb, "review_scores_rating"),
                    name = "Rating\n(Quintile Breaks)") +
  labs(title = "Airbnb Review Rating in Each Neighborhood", 
       subtitle = "Map 5-1")

i. K-means

First, we will do a K-means clustering on the neighborhood to classify them according to the review scores of each apartment. There are 7 dimensions of reviews of each apartment, including rating, accuracy, cleanliness, checkin, communication, location, value, they are selected by review_scores.nb[c(2:8)].

data_scaled<- scale(review_scores.nb[c(2:8)]%>%st_drop_geometry()) 
distance <- get_dist(data_scaled)
fviz_dist(distance, gradient = list(low = "#00AFBB", mid = "white", high = "#EB4C60")) 

set.seed(123)
k2 <- kmeans(data_scaled, centers = 2, nstart = 25)
k3 <- kmeans(data_scaled, centers = 3, nstart = 25)
k4 <- kmeans(data_scaled, centers = 4, nstart = 25)
k5 <- kmeans(data_scaled, centers = 5, nstart = 25)

p1 <- fviz_cluster(k2, geom = "point", data = data_scaled, labelsize = 1, ellipse.type = "convex", ellipse.alpha = 0 ) + ggtitle("k = 2") +
   theme(axis.line = element_line(),
   panel.grid.major = element_blank(),
   panel.grid.minor = element_blank(),
   panel.border = element_blank(),
   panel.background = element_blank())
p2 <- fviz_cluster(k3, geom = "point",  data = data_scaled, labelsize = 1, ellipse.type = "convex", ellipse.alpha = 0 ) + ggtitle("k = 3") +
   theme(axis.line = element_line(),
   panel.grid.major = element_blank(),
   panel.grid.minor = element_blank(),
   panel.border = element_blank(),
   panel.background = element_blank())
p3 <- fviz_cluster(k4, geom = "point",  data = data_scaled, labelsize = 1, ellipse.type = "convex", ellipse.alpha = 0 ) + ggtitle("k = 4") +
   theme(axis.line = element_line(),
   panel.grid.major = element_blank(),
   panel.grid.minor = element_blank(),
   panel.border = element_blank(),
   panel.background = element_blank())
p4 <- fviz_cluster(k5, geom = "point",  data = data_scaled, labelsize = 1, ellipse.type = "convex", ellipse.alpha = 0 ) + ggtitle("k = 5") +
   theme(axis.line = element_line(),
   panel.grid.major = element_blank(),
   panel.grid.minor = element_blank(),
   panel.border = element_blank(),
   panel.background = element_blank())

grid.arrange(p1, p2, p3, p4, nrow = 2)

The clustering with our chosen variables goes as follows: We begin by scaling, or normalizing, the data, which places everything on a scale with a mean of 0 and a standard deviation of 1. All variables in the algorithm must be measured on the same scale in order to be given equal weight in the next step, which is to calculate the Euclidean distance between each census tract for the variables. The fviz function can be used to visualize this distance matrix.

To maximize the difference between all the groups and that minimize the difference in observations within the groups, we choose the cluster 4 and examine the characteristics of this solution as below.

cltclusters<- review_scores.nb %>%
  mutate(cluster4 = k4$cluster) %>%
  group_by(cluster4) %>%
  summarise_all("mean") %>%
  select(-c("neighbourhood_cleansed"))
kable(x=cltclusters)%>%kable_minimal()
cluster4 review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value neighbourhood_group geometry
1 4.753112 4.792388 4.701025 4.851735 4.839778 4.647258 4.619955 NA MULTIPOLYGON (((4.839426 52…
2 4.818214 4.847455 4.761887 4.880969 4.894309 4.809295 4.651767 NA POLYGON ((4.848885 52.35791…
3 4.834702 4.859731 4.767027 4.903170 4.913189 4.679830 4.693334 NA MULTIPOLYGON (((4.991614 52…
4 4.813629 4.846609 4.778137 4.879906 4.865706 4.601134 4.672704 NA POLYGON ((4.899007 52.33071…

We than join the cluster assignment into the neighborhood in Map 5-2.

cltdata <- review_scores.nb %>%
  mutate(cluster4 = k4$cluster) %>%st_as_sf()


ggplot() +
#  geom_sf(data = nb, color = "#767E8E", fill = "transparent")

  geom_sf(data = cltdata, aes(fill = q5(cluster4)))+
  scale_fill_manual(values = palette5,
                    labels = qBr(cltdata, "cluster4"),
                    name = "Cluster 4\n(Quintile Breaks)") +
  labs(title = "Airbnb Cluster in Each Neighborhood", subtitle = "Map 5-2")

ii. Data cleaning

Here we begin our text analysis by data cleaning. Firstly, we join all review text into one data set, then trim and transfer text to lowercase. Removing punctuation is the second step, followed by removing stop words in English, French and Spanish. Finally, we count words frequency after then removing words occurred less than 5 times.

iii. Word cloud

Our data set has a great number of reviews ( about 1.2 million rows ) and we calculated the top mentioned words in figure below after cleaning words. Obviously, some words are highly mentioned in reviews, such as “time”, “location”, “clean”.

set.seed(12345)
wordcloud2(data=words, size=1.6, color='random-dark')

iiii. Text analysis

words %>%
  filter(nn >= 5000) %>% 
  arrange(nn) %>%
#  group_by(d) %>%
  top_n(25, nn) %>%
  ungroup() %>%
  mutate(n = factor(word, unique(word))) %>%
  ggplot(aes(word, nn)) +
  geom_col(show.legend = FALSE) +
#  facet_wrap(~ d, scales = "free", ncol = 3) +
  coord_flip() +
  labs(x = NULL, 
       y = "Words counts",
       title = "Review text top mentioned words count",
         subtitle = "Figure 5-1")

To examine if the top mentioned words have correlation with price, we extracted a sample data set with 1000 words from the whole reviews data set and created a new column called “clean”, for example, to check whether the reviews contain the word “clean” or not recorded as 1,0. Then, plots the mean price of apartments in two groups, with or without “clean” mentioned in their reviews.

w = "clean"

churn <-
  dat %>%
  mutate(binary = ifelse(str_detect(dat$comments, w) == TRUE, 1, 0))

churn <-
  merge(x = churn, y = dec_price.id, by.x = "listing_id", by.y = "id", all.x = TRUE) %>%
  na.omit()

churn %>%
  dplyr::select(price, binary) %>%
  gather(Variable, value, price) %>%
  ggplot(aes(binary, value, fill=binary)) + 
    geom_bar(position = "dodge", stat = "summary", fun = "mean") + 
#    facet_wrap(~Variable, scales = "free") +
#    scale_fill_manual(values = palette2) +
    labs(x=w, y="Mean", 
         title = "Prices associations with whether contain clean in reviews",
         subtitle = "Figure 5-2") +
    theme(legend.position = "none")

From Figure 5-2, we can tell that whether the reviews of an apartment contain “clean” or not, their prices may differ slightly. Will add those words that have relationship with prices in our regression model as dummy variables.

6. Model Result: Short-term Evaluation

a. Airbnb Price Regression Model

#split the dataset
#airbnb_geo <- airbnb_geo%>%
  #filter(!(property_type %in% c("Cave"," Entire home/apt", "Shared room in bed and breakfast", "Shared room in boat")) )
  

airbnb_geo <-airbnb_geo[-c(4598, 10294, 15647, 21832, 4876, 10565, 15907 ,22087, 338 , 6063 ,11617, 17743,3192 , 8927, 14385, 20537,4265,  9976 ,15357, 21539 ),]

set.seed(3456)
inTrain <- createDataPartition(
              y = paste( airbnb_geo$neighbourhood_group, airbnb_geo$neighbourhood_group_cleansed, airbnb_geo$property_type, airbnb_geo$bathrooms_text
                        ), 
              p = .60, list = FALSE)

listing_training <- airbnb_geo[inTrain,]
listing_test <- airbnb_geo[-inTrain,]


listing_test_new <- listing_test                                # Duplicate test data set
listing_test_new$property_type[which(!(listing_test_new$property_type %in% unique(listing_training$property_type)))] <- NA  # Replace new levels by NA
#data_test_new 

#factor property_type has new levels Cave, Entire home/apt, Shared room in bed and breakfast, Shared room in boa
reg.1 <- lm(price ~ ., data = as.data.frame(listing_training) %>%
              dplyr::select( price,host_response_rate, host_acceptance_rate, host_listings_count, host_total_listings_count, neighbourhood_cleansed, property_type, room_type, accommodates, bathrooms_text,  minimum_nights, maximum_nights, availability_30, number_of_reviews, review_scores_rating, review_scores_accuracy, reviews_per_month))

Based on our goal for this project, we believed it is necessary to include the following parameters: distance to landmarks, amenity features, host information. The model we built is an OLS regression model, which uses the price column as the dependent variable, and the model is refined by the training set.

reg.2 <- lm(price ~ ., data = as.data.frame(listing_training) %>%
              dplyr::select( price, host_response_rate, host_acceptance_rate, host_listings_count, host_total_listings_count, neighbourhood_cleansed, room_type, accommodates,property_type, bathrooms_text,  minimum_nights, maximum_nights, availability_30, number_of_reviews, review_scores_rating, reviews_per_month, host_has_profile_pic, host_identity_verified, beds,bedrooms
                             ,landmark_nn3,landmark_nn4,trans_nn3,trans_nn5,wifi, tv, private , dryer, quite, clean
                             ))

stargazer(reg.2 ,type = "text", 
          title = "Summary Statistics of Airbnb Price Prediction Model ",
          header = FALSE,
          single.row = TRUE)
## 
## Summary Statistics of Airbnb Price Prediction Model
## ========================================================================================
##                                                                  Dependent variable:    
##                                                              ---------------------------
##                                                                         price           
## ----------------------------------------------------------------------------------------
## host_response_rate                                               -21.519*** (7.316)     
## host_acceptance_rate                                              10.887*** (4.157)     
## host_listings_count                                               -0.219*** (0.084)     
## host_total_listings_count                                         0.239*** (0.075)      
## neighbourhood_cleansedBijlmer-Oost                                 15.290 (15.817)      
## neighbourhood_cleansedBos en Lommer                                -2.823 (11.324)      
## neighbourhood_cleansedBuitenveldert - Zuidas                      -10.416 (12.258)      
## neighbourhood_cleansedCentrum-Oost                               36.638*** (12.755)     
## neighbourhood_cleansedCentrum-West                               40.139*** (12.723)     
## neighbourhood_cleansedDe Aker - Nieuw Sloten                     56.543*** (13.117)     
## neighbourhood_cleansedDe Baarsjes - Oud-West                      20.463* (12.247)      
## neighbourhood_cleansedDe Pijp - Rivierenbuurt                      19.410 (12.140)      
## neighbourhood_cleansedGaasperdam - Driemond                       27.635** (12.676)     
## neighbourhood_cleansedGeuzenveld - Slotermeer                     28.435** (11.969)     
## neighbourhood_cleansedIJburg - Zeeburgereiland                    20.469* (11.445)      
## neighbourhood_cleansedNoord-Oost                                  -13.847 (13.759)      
## neighbourhood_cleansedNoord-West                                 -48.598*** (14.661)    
## neighbourhood_cleansedOostelijk Havengebied - Indische Buurt       -8.499 (12.776)      
## neighbourhood_cleansedOsdorp                                     36.927*** (14.138)     
## neighbourhood_cleansedOud-Noord                                    -6.789 (13.072)      
## neighbourhood_cleansedOud-Oost                                     -1.048 (12.637)      
## neighbourhood_cleansedSlotervaart                                  9.956 (12.940)       
## neighbourhood_cleansedWatergraafsmeer                              -9.156 (11.693)      
## neighbourhood_cleansedWesterpark                                   3.482 (12.701)       
## neighbourhood_cleansedZuid                                        30.115** (12.356)     
## room_typeHotel room                                             -178.730*** (35.165)    
## room_typePrivate room                                           -166.992*** (29.845)    
## room_typeShared room                                              -60.391 (61.789)      
## accommodates                                                      29.027*** (1.305)     
## property_typeBoat                                                  3.069 (34.991)       
## property_typeCamper/RV                                            -109.671 (96.394)     
## property_typeCasa particular                                       28.965 (71.354)      
## property_typeEntire bungalow                                      -18.847 (49.306)      
## property_typeEntire cabin                                         -22.404 (43.899)      
## property_typeEntire chalet                                         -7.072 (41.792)      
## property_typeEntire condo                                          49.097 (34.162)      
## property_typeEntire condominium (condo)                            -7.307 (34.783)      
## property_typeEntire cottage                                        31.370 (40.603)      
## property_typeEntire guest suite                                   -17.713 (35.731)      
## property_typeEntire guesthouse                                     11.470 (35.798)      
## property_typeEntire home                                           0.679 (34.144)       
## property_typeEntire loft                                          64.764* (34.479)      
## property_typeEntire place                                         -31.151 (42.338)      
## property_typeEntire rental unit                                    -1.464 (33.994)      
## property_typeEntire residential home                              -48.800 (34.645)      
## property_typeEntire serviced apartment                             42.679 (34.556)      
## property_typeEntire townhouse                                      -8.662 (34.451)      
## property_typeEntire vacation home                                  27.593 (42.431)      
## property_typeEntire villa                                          55.356 (37.793)      
## property_typeFarm stay                                            -19.524 (44.127)      
## property_typeHouseboat                                             41.218 (34.604)      
## property_typePrivate room                                        116.454** (49.348)     
## property_typePrivate room in barn                                  9.533 (99.304)       
## property_typePrivate room in bed and breakfast                   125.482*** (45.403)    
## property_typePrivate room in boat                                110.052** (45.990)     
## property_typePrivate room in bungalow                            154.773** (68.336)     
## property_typePrivate room in cabin                               132.421** (59.977)     
## property_typePrivate room in casa particular                     101.250** (49.940)     
## property_typePrivate room in condo                               138.354*** (46.104)    
## property_typePrivate room in condominium (condo)                  95.841** (47.148)     
## property_typePrivate room in earthen home                        151.750** (77.143)     
## property_typePrivate room in farm stay                           146.688*** (49.378)    
## property_typePrivate room in guest suite                         123.290*** (45.749)    
## property_typePrivate room in guesthouse                          139.173*** (49.688)    
## property_typePrivate room in home                                153.628*** (45.691)    
## property_typePrivate room in hostel                              144.329*** (49.214)    
## property_typePrivate room in houseboat                           125.286*** (45.672)    
## property_typePrivate room in loft                                103.473** (46.335)     
## property_typePrivate room in nature lodge                        474.245*** (77.451)    
## property_typePrivate room in rental unit                         108.490** (45.435)     
## property_typePrivate room in residential home                     96.483** (45.980)     
## property_typePrivate room in serviced apartment                  246.638*** (48.656)    
## property_typePrivate room in tiny home                           129.505** (59.823)     
## property_typePrivate room in tiny house                           112.773* (62.996)     
## property_typePrivate room in townhouse                           123.552*** (45.663)    
## property_typePrivate room in villa                               155.870*** (51.790)    
## property_typeRoom in aparthotel                                  136.175*** (39.660)    
## property_typeRoom in bed and breakfast                           147.678*** (50.310)    
## property_typeRoom in boutique hotel                              176.837*** (45.686)    
## property_typeRoom in hostel                                      138.154*** (51.918)    
## property_typeRoom in hotel                                       170.102*** (45.862)    
## property_typeRoom in serviced apartment                          290.827*** (51.064)    
## property_typeShared room in bed and breakfast                      28.941 (83.324)      
## property_typeShared room in home                                   7.163 (80.666)       
## property_typeShared room in hostel                                 1.358 (55.252)       
## property_typeShared room in houseboat                             119.852* (61.380)     
## property_typeShared room in rental unit                            -0.721 (58.525)      
## property_typeShared room in residential home                                            
## property_typeTent                                                  -9.205 (77.093)      
## property_typeTiny home                                             16.573 (41.908)      
## property_typeTiny house                                            1.790 (70.982)       
## property_typeTower                                               198.824*** (47.775)    
## property_typeWindmill                                             129.586* (71.821)     
## property_typeYurt                                                 -34.042 (94.966)      
## bathrooms_text0 baths                                             -44.543 (30.172)      
## bathrooms_text0 shared baths                                      -34.846 (31.176)      
## bathrooms_text1 bath                                              -11.523 (24.097)      
## bathrooms_text1 private bath                                       -7.950 (23.859)      
## bathrooms_text1 shared bath                                       -16.907 (24.189)      
## bathrooms_text1.5 baths                                            -2.863 (24.081)      
## bathrooms_text1.5 shared baths                                    -16.854 (24.316)      
## bathrooms_text2 baths                                              37.375 (24.408)      
## bathrooms_text2 shared baths                                       -4.818 (32.255)      
## bathrooms_text2.5 baths                                          83.452*** (24.920)     
## bathrooms_text2.5 shared baths                                   294.456*** (95.721)    
## bathrooms_text3 baths                                            108.671*** (25.655)    
## bathrooms_text3 shared baths                                      -48.705 (39.929)      
## bathrooms_text3.5 baths                                          84.128*** (29.347)     
## bathrooms_text3.5 shared baths                                     47.428 (47.496)      
## bathrooms_text4 baths                                              20.972 (44.329)      
## bathrooms_text4.5 baths                                          277.202*** (43.751)    
## bathrooms_text5 baths                                              47.393 (38.285)      
## bathrooms_text5.5 baths                                          135.792** (58.737)     
## bathrooms_textHalf-bath                                           -31.039 (37.636)      
## bathrooms_textPrivate half-bath                                   -31.607 (35.474)      
## bathrooms_textShared half-bath                                   -99.405*** (28.684)    
## minimum_nights                                                      0.025 (0.033)       
## maximum_nights                                                     -0.003* (0.002)      
## availability_30                                                   2.028*** (0.133)      
## number_of_reviews                                                 -0.078*** (0.013)     
## review_scores_rating                                              14.760*** (3.265)     
## reviews_per_month                                                  -0.717 (0.490)       
## host_has_profile_pict                                            -42.579*** (15.313)    
## host_identity_verifiedt                                            6.871** (2.763)      
## beds                                                              -7.800*** (1.206)     
## bedrooms                                                          28.168*** (1.985)     
## landmark_nn3                                                  5,193.457*** (1,747.448)  
## landmark_nn4                                                  -6,136.821*** (1,697.320) 
## trans_nn3                                                       2,411.419 (1,555.101)   
## trans_nn5                                                      -2,219.012 (1,670.408)   
## wifi                                                              15.326*** (3.285)     
## tv                                                                 5.034* (2.574)       
## private                                                           -5.551** (2.308)      
## dryer                                                              -1.852 (3.825)       
## quite                                                               1.636 (5.590)       
## clean                                                               1.156 (3.215)       
## Constant                                                           67.164 (49.536)      
## ----------------------------------------------------------------------------------------
## Observations                                                            8,866           
## R2                                                                      0.512           
## Adjusted R2                                                             0.505           
## Residual Std. Error                                              88.185 (df = 8730)     
## F Statistic                                                  67.913*** (df = 135; 8730) 
## ========================================================================================
## Note:                                                        *p<0.1; **p<0.05; ***p<0.01
#property_type
#bathrooms_text,

b. Airbnb Demand Regression Model

To get the availability of each Airbnb listing, the OLS model we built consists variables that associate with the reviews of the listing, host information, room features. Finally, the model is refined by the training set.

reg_demand1 <- lm(availability_30 ~ ., data = as.data.frame(listing_training) %>%
              dplyr::select( availability_30, host_response_rate, host_acceptance_rate, host_listings_count, host_total_listings_count, neighbourhood_cleansed, room_type, accommodates,property_type, bathrooms_text,  minimum_nights, maximum_nights, price, number_of_reviews, review_scores_rating, reviews_per_month, host_has_profile_pic, host_identity_verified, beds,bedrooms,landmark_nn3,landmark_nn4,trans_nn3,trans_nn5,wifi, tv, private , dryer, quite, clean,availability_90 ))

stargazer(reg_demand1 ,type = "text", 
          title = "Summary Statistics of Model Airbnb Price Availability Model ",
          header = FALSE,
          single.row = TRUE)
## 
## Summary Statistics of Model Airbnb Price Availability Model
## ========================================================================================
##                                                                  Dependent variable:    
##                                                              ---------------------------
##                                                                    availability_30      
## ----------------------------------------------------------------------------------------
## host_response_rate                                                 -0.296 (0.353)       
## host_acceptance_rate                                               0.506** (0.201)      
## host_listings_count                                                0.007* (0.004)       
## host_total_listings_count                                          -0.005 (0.004)       
## neighbourhood_cleansedBijlmer-Oost                                 1.572** (0.762)      
## neighbourhood_cleansedBos en Lommer                                1.121** (0.546)      
## neighbourhood_cleansedBuitenveldert - Zuidas                        0.498 (0.591)       
## neighbourhood_cleansedCentrum-Oost                                 1.369** (0.615)      
## neighbourhood_cleansedCentrum-West                                 1.431** (0.614)      
## neighbourhood_cleansedDe Aker - Nieuw Sloten                       -0.728 (0.633)       
## neighbourhood_cleansedDe Baarsjes - Oud-West                       1.288** (0.590)      
## neighbourhood_cleansedDe Pijp - Rivierenbuurt                      1.269** (0.585)      
## neighbourhood_cleansedGaasperdam - Driemond                        -0.522 (0.611)       
## neighbourhood_cleansedGeuzenveld - Slotermeer                       0.374 (0.577)       
## neighbourhood_cleansedIJburg - Zeeburgereiland                     -0.181 (0.552)       
## neighbourhood_cleansedNoord-Oost                                  1.881*** (0.663)      
## neighbourhood_cleansedNoord-West                                   1.441** (0.707)      
## neighbourhood_cleansedOostelijk Havengebied - Indische Buurt      1.627*** (0.616)      
## neighbourhood_cleansedOsdorp                                       -0.760 (0.682)       
## neighbourhood_cleansedOud-Noord                                   1.765*** (0.630)      
## neighbourhood_cleansedOud-Oost                                     1.245** (0.609)      
## neighbourhood_cleansedSlotervaart                                  1.064* (0.624)       
## neighbourhood_cleansedWatergraafsmeer                              1.299** (0.564)      
## neighbourhood_cleansedWesterpark                                   1.280** (0.612)      
## neighbourhood_cleansedZuid                                          0.964 (0.596)       
## room_typeHotel room                                                 2.208 (1.697)       
## room_typePrivate room                                              2.690* (1.441)       
## room_typeShared room                                               -2.798 (2.978)       
## accommodates                                                       0.143** (0.065)      
## property_typeBoat                                                   0.950 (1.686)       
## property_typeCamper/RV                                            11.829** (4.644)      
## property_typeCasa particular                                       -2.292 (3.439)       
## property_typeEntire bungalow                                       -0.026 (2.376)       
## property_typeEntire cabin                                          -0.976 (2.116)       
## property_typeEntire chalet                                         -0.006 (2.014)       
## property_typeEntire condo                                           0.189 (1.646)       
## property_typeEntire condominium (condo)                             0.787 (1.676)       
## property_typeEntire cottage                                        -0.836 (1.957)       
## property_typeEntire guest suite                                    -0.319 (1.722)       
## property_typeEntire guesthouse                                      0.751 (1.725)       
## property_typeEntire home                                           -0.340 (1.645)       
## property_typeEntire loft                                           -0.505 (1.662)       
## property_typeEntire place                                           1.189 (2.040)       
## property_typeEntire rental unit                                     0.343 (1.638)       
## property_typeEntire residential home                                1.311 (1.670)       
## property_typeEntire serviced apartment                              1.393 (1.666)       
## property_typeEntire townhouse                                      -0.167 (1.660)       
## property_typeEntire vacation home                                  -0.772 (2.045)       
## property_typeEntire villa                                           0.656 (1.821)       
## property_typeFarm stay                                              2.317 (2.126)       
## property_typeHouseboat                                              0.852 (1.668)       
## property_typePrivate room                                          -2.289 (2.379)       
## property_typePrivate room in barn                                   5.811 (4.785)       
## property_typePrivate room in bed and breakfast                     -3.032 (2.189)       
## property_typePrivate room in boat                                  -0.512 (2.217)       
## property_typePrivate room in bungalow                              -3.390 (3.294)       
## property_typePrivate room in cabin                                 -1.973 (2.891)       
## property_typePrivate room in casa particular                       -3.889 (2.407)       
## property_typePrivate room in condo                                -4.900** (2.223)      
## property_typePrivate room in condominium (condo)                   -1.510 (2.273)       
## property_typePrivate room in earthen home                          -5.854 (3.718)       
## property_typePrivate room in farm stay                             -2.766 (2.381)       
## property_typePrivate room in guest suite                           -2.732 (2.206)       
## property_typePrivate room in guesthouse                            -2.481 (2.395)       
## property_typePrivate room in home                                  -3.922* (2.203)      
## property_typePrivate room in hostel                                -0.920 (2.373)       
## property_typePrivate room in houseboat                             -2.320 (2.202)       
## property_typePrivate room in loft                                  -2.628 (2.234)       
## property_typePrivate room in nature lodge                         -8.830** (3.740)      
## property_typePrivate room in rental unit                           -2.734 (2.190)       
## property_typePrivate room in residential home                      -1.795 (2.216)       
## property_typePrivate room in serviced apartment                     1.015 (2.348)       
## property_typePrivate room in tiny home                              2.121 (2.884)       
## property_typePrivate room in tiny house                            -5.690* (3.036)      
## property_typePrivate room in townhouse                             -3.488 (2.201)       
## property_typePrivate room in villa                                 -0.615 (2.497)       
## property_typeRoom in aparthotel                                     0.376 (1.913)       
## property_typeRoom in bed and breakfast                             -0.911 (2.426)       
## property_typeRoom in boutique hotel                                -0.951 (2.204)       
## property_typeRoom in hostel                                        -3.060 (2.504)       
## property_typeRoom in hotel                                         -1.522 (2.212)       
## property_typeRoom in serviced apartment                            -3.315 (2.466)       
## property_typeShared room in bed and breakfast                       3.997 (4.016)       
## property_typeShared room in home                                   -0.935 (3.887)       
## property_typeShared room in hostel                                7.569*** (2.662)      
## property_typeShared room in houseboat                              4.966* (2.959)       
## property_typeShared room in rental unit                            6.670** (2.820)      
## property_typeShared room in residential home                                            
## property_typeTent                                                   2.520 (3.715)       
## property_typeTiny home                                             -2.566 (2.020)       
## property_typeTiny house                                            -5.740* (3.421)      
## property_typeTower                                                  1.486 (2.305)       
## property_typeWindmill                                               0.392 (3.462)       
## property_typeYurt                                                   0.902 (4.577)       
## bathrooms_text0 baths                                              -1.120 (1.454)       
## bathrooms_text0 shared baths                                      -3.463** (1.503)      
## bathrooms_text1 bath                                                0.545 (1.161)       
## bathrooms_text1 private bath                                        0.509 (1.150)       
## bathrooms_text1 shared bath                                         0.152 (1.166)       
## bathrooms_text1.5 baths                                             0.491 (1.160)       
## bathrooms_text1.5 shared baths                                      0.775 (1.172)       
## bathrooms_text2 baths                                               0.092 (1.176)       
## bathrooms_text2 shared baths                                       -1.676 (1.554)       
## bathrooms_text2.5 baths                                             0.143 (1.202)       
## bathrooms_text2.5 shared baths                                      5.418 (4.616)       
## bathrooms_text3 baths                                               0.208 (1.238)       
## bathrooms_text3 shared baths                                        3.005 (1.925)       
## bathrooms_text3.5 baths                                             1.183 (1.415)       
## bathrooms_text3.5 shared baths                                     -1.051 (2.289)       
## bathrooms_text4 baths                                               0.538 (2.136)       
## bathrooms_text4.5 baths                                            -2.103 (2.113)       
## bathrooms_text5 baths                                               2.097 (1.845)       
## bathrooms_text5.5 baths                                            -0.207 (2.831)       
## bathrooms_textHalf-bath                                             0.048 (1.814)       
## bathrooms_textPrivate half-bath                                     1.479 (1.710)       
## bathrooms_textShared half-bath                                     -1.026 (1.383)       
## minimum_nights                                                      0.001 (0.002)       
## maximum_nights                                                   0.0003*** (0.0001)     
## price                                                               0.001 (0.001)       
## number_of_reviews                                                 -0.003*** (0.001)     
## review_scores_rating                                               -0.123 (0.158)       
## reviews_per_month                                                 0.093*** (0.024)      
## host_has_profile_pict                                              -0.506 (0.738)       
## host_identity_verifiedt                                            -0.251* (0.133)      
## beds                                                               0.101* (0.058)       
## bedrooms                                                          -0.321*** (0.097)     
## landmark_nn3                                                      -80.707 (84.248)      
## landmark_nn4                                                      108.208 (81.847)      
## trans_nn3                                                         137.439* (74.944)     
## trans_nn5                                                        -174.220** (80.500)    
## wifi                                                                0.020 (0.159)       
## tv                                                                 -0.134 (0.124)       
## private                                                             0.111 (0.111)       
## dryer                                                               0.071 (0.184)       
## quite                                                               0.360 (0.269)       
## clean                                                               0.085 (0.155)       
## availability_90                                                   0.244*** (0.002)      
## Constant                                                           -1.074 (2.389)       
## ----------------------------------------------------------------------------------------
## Observations                                                            8,866           
## R2                                                                      0.729           
## Adjusted R2                                                             0.724           
## Residual Std. Error                                               4.250 (df = 8729)     
## F Statistic                                                  172.286*** (df = 136; 8729)
## ========================================================================================
## Note:                                                        *p<0.1; **p<0.05; ***p<0.01

c. Short-Term Income Analysis

test_set_result<-
  listing_test_new %>%
  mutate(Regression = "baseline Regression",
         price.Predict = predict(reg.2, listing_test_new),
         price.Error = price.Predict - price,
         price.AbsError = abs(price.Predict - price),
         price.APE = (abs(price.Predict - price)) / price.Predict)%>%
  mutate(Regression = "demand Regression",
         demand.Predict = predict(reg_demand1, listing_test_new),
         demand.Error = demand.Predict - availability_30,
         demand.AbsError = abs(demand.Predict - availability_30),
         demand.APE = (abs(demand.Predict - availability_30)) / demand.Predict)

test_set_result<- test_set_result%>%  
  dplyr::select(-bathrooms, -neighbourhood_group_cleansed,-calendar_updated)

test_set_result <- test_set_result%>%
  na.omit()


MAE = mean(listing_test$price.AbsError, na.rm = T)
MAPE = mean(listing_test$price.APE, na.rm = T)


Revenue_ <- test_set_result$price.Predict * test_set_result$demand.Predict
Airbnb_Listing <- test_set_result$name
Location <- test_set_result$host_neighbourhood
benefit_an <- data.frame(Airbnb_Listing,Location,Revenue_)%>%
  filter(Revenue_ > 0)
benefit_an <- benefit_an %>% mutate_all(na_if,"")%>%
  na.omit()


benefit_an1 <- benefit_an %>%
  group_by(Location)%>%
  summarise(mean_Revenue = mean(Revenue_))

We computed the average Airbnb revenue in each district by multiplying the predict price column with the predicted arability column, subtract the 3% app service fee. Finally, we put this column side-by-side with the average long term rent price to see the difference in revenue.

datas<- benefit_an[sample(nrow(benefit_an),10),] 

District <- c("Centrum", "West", 'Nieuw-West', 'Zuid', "Oost", "Noord", "Zuidoost")
Observed_Long_term_Rent <- c(538, 529,560,560,566,574,540)
Predicted_Airbnb_Revenue <- (c(1732.6, 829.7, 479.1, 785.5, 677.7, 639.5, 785.5))*0.97
benefit_an2 <- data.frame(District ,Observed_Long_term_Rent,Predicted_Airbnb_Revenue)%>%
  mutate(Comparison = Predicted_Airbnb_Revenue-Observed_Long_term_Rent )


library("kableExtra")
benefit_an2 %>%
  kbl(caption = "Predicted Airbnb Revenue Compared with Observed Long-term Housing Price in Each District")%>%
  kable_minimal()
Predicted Airbnb Revenue Compared with Observed Long-term Housing Price in Each District
District Observed_Long_term_Rent Predicted_Airbnb_Revenue Comparison
Centrum 538 1680.622 1142.622
West 529 804.809 275.809
Nieuw-West 560 464.727 -95.273
Zuid 560 761.935 201.935
Oost 566 657.369 91.369
Noord 574 620.315 46.315
Zuidoost 540 761.935 221.935

The column “Comparison” is calculated by “Predicted_Airbnb_Revenue” minus “observed_Long_term_Rent”. predicted Airbnb revenue is much higher than the observed long-term rental market price in almost all of the districts, except for the West district. The predicted Airbnb revenue in the “Centrum” district is much higher than other districts, but turns out to be the second lowest price in the long-term rental market. It is an interesting discovery. We think it may be because the target customer groups of long-term and short-term rentals are different. One is residents who need to live in the local city for a while, another one is short stay travelers.

7. Accuracy and Generalizability

a. Accuracy

i. Train and Test Set Examining

ggplot() +
  geom_sf(data = neighbourhoods_geo) +
  geom_sf(data = test_set_result, aes(colour = q5(price.AbsError), na.rm = TRUE), 
          show.legend = "point", size = 1.25) +
  scale_colour_manual(values = palette5blue,
                   labels=qBr(test_set_result,"price.AbsError"),
                   name="Absolute Residual") +
  labs(title="Absolute Residual of Prediction", caption  = 'Map7-1') +
  mapTheme()#+

The absolute residual map showed that our price regression model tended to produce more residual in central areas.

ii. Spatial Lags

coords.test <-  st_coordinates(test_set_result) 

neighborList.test <- knn2nb(knearneigh(coords.test, 5))

spatialWeights.test <- nb2listw(neighborList.test, style="W")

test_set_result <- test_set_result %>%
  mutate(lagPriceError = lag.listw(spatialWeights.test, price.Error, na.rm = T))


test_set_result <- test_set_result %>%
  mutate(lagPrice = lag.listw(spatialWeights.test, price, na.rm = T))


ggplot(test_set_result, aes(x=lagPriceError, y=price)) + geom_point()+
  geom_smooth(method=lm, color='red')+
  labs( title = 'Error as a Function of the Spatial Lag of Airbnb Price',caption = 'Figure7-1' , subtitle = sprintf('correlation = %s',round(cor(test_set_result$lagPriceError, test_set_result$price), 2)))+
  
  plotTheme()

The lag error and price plot can be interpreted that as the Airbnb price errors increase, nearby Airbnb price errors decrease a little bit. However, the correlation is relative weak we can conclude whether the listings are spatially autocorrelated. further analysis using Moran’s I would help to draw a conclusion.

ggplot(test_set_result, aes(x=lagPrice, y=price)) + geom_point()+
  geom_smooth(method=lm, color='red')+
  labs( title = 'Price as a Function of the Spatial Lag of Airbnb Price \n spatial lag of price (mean price of 5 nearesrt neighbors)',caption = 'Figure7-2' , subtitle = sprintf('correlation = %s',round(cor(test_set_result$lagPrice, test_set_result$price), 2)))+
  
  plotTheme()

In the price and lag price scatterplot, we observed that as the price increases, the price of nearby Airbnb listings increases as well. The correlation between the spatial lag price and the price is 0.39, and it doesn’t seem to be a significant correlation. Thus, there is no substantial evidence for the clustering of Airbnb prices.

#moran's I



moranTest <- moran.mc(test_set_result$ price.Error, 
                      spatialWeights.test, nsim = 999)

ggplot(as.data.frame(moranTest$res[c(1:999)]), aes(moranTest$res[c(1:999)])) +
  geom_histogram(binwidth = 0.01) +
  geom_vline(aes(xintercept = moranTest$statistic), colour = "#FA7800",size=1) +
  scale_x_continuous(limits = c(-1, 1)) +
  labs(title="Observed and permuted Moran's I",
       subtitle= "Observed Moran's I in orange",
       x="Moran's I",
       y="Count",
       caption = 'Figure 7-3') +
  plotTheme()

In the Moran’s I plot, we didn’t see a large Moran’s I value shown in Orange. Based on this result, we can conclude that there is no significant spatial autocorrelation in our price regression.

iii. Predicted Price as A Function of Observed Price

ggplot(test_set_result, aes(x=price.Predict, y=price)) + 
  geom_point()+
  geom_smooth(method=lm, colour="red") +
  geom_abline(color = '#BFC3D6', size = 1)+
  labs( title = 'Price as a Function of Predicted Price',subtitle = sprintf('correlation = %s',round(cor(test_set_result$price.Predict, test_set_result$price), 2)),  caption = 'Figure 7-4')+
  plotTheme()

In the predicted price as a function of observed price plot, our regression appeared to be performing well for listings that have rates below $300-$350 per night. The variance start to increase for more expensive listings, which indicates that there is still improvement for our model. The grey line shown in the graph represent the prefect correlation between the predicted price and observed price. The red line is the trend line between the two, and we conclude that our model has an acceptable accuracy based on that the two lines are very close to each other.

b. Generalizability

i. Cross-validation

fitControl <- trainControl(method = "cv", number = 100)
set.seed(825)

reg.cv <- 
  train(price ~ ., data = as.data.frame(listing_training) %>%
              dplyr::select( price, host_response_rate, host_acceptance_rate, host_listings_count, host_total_listings_count, neighbourhood_cleansed, room_type, accommodates,property_type, bathrooms_text,  minimum_nights, maximum_nights, availability_30, number_of_reviews, review_scores_rating, reviews_per_month, host_has_profile_pic, host_identity_verified, beds,bedrooms,landmark_nn3,landmark_nn4,trans_nn3,trans_nn5,wifi,tv ,private, clean, quite), 
     method = "lm", trControl = fitControl, na.action = na.pass)

reg.cv
## Linear Regression 
## 
## 14704 samples
##    28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (100 fold) 
## Summary of sample sizes: 14558, 14557, 14556, 14556, 14557, 14556, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   88.29713  0.5024856  61.76172
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE
hist(reg.cv$resample[,3],
     main="Cross-Validation MAE Histogram Chart",
     xlab="Distribution of Mean Absolute Errors ",
     sub = 'Figure 7-5',
     col = 'skyblue3',
     breaks =50)

The cross validation plot of MAE showed a normal distribution and cluster tightly, which is a clear indicator that our model has an acceptable generalizability.

ii. Testing Model of Mean absolute Percentage Error (MAPE) by Neighborhood.

#listings_prices <- airbnb_geo %>%
  #dplyr::select(id, price)
library(gghighlight)
pricesERROR.in.neighbors <- #spatial join neighobrhood and listing informatoin
  neighbourhoods_geo %>%
  st_join(test_set_result, join = st_intersects)%>%
  dplyr::select(-neighbourhood_group)



neighbors_mean_priceERROR <- pricesERROR.in.neighbors %>%   #the mean price of eah Neighborhood
  group_by(neighbourhood_cleansed)%>%
  summarise(mean_MAPE_percent = mean(price.APE *100,  na.rm = T))

ggplot() +
  geom_sf(data=neighbors_mean_priceERROR, aes(fill = q5(mean_MAPE_percent)))+
  scale_fill_manual(values = palette5blue,
                    labels = qBr(neighbors_mean_priceERROR, "mean_MAPE_percent"),
                    name = "mean MAPE% \n(Quintile Breaks)")+
  labs(title = "Airbnb Model Mean MAPE in Each Neighborhood",caption = 'Figure 7-6')

Finally, the mean MAPE map shows that our model didn’t perform well in parts o Nieuw-West, Noord, and Zuidoost districts. We believed that this issue can be caused by spatial information being omitted from the regressions.

8. Conclusion

As for the result of mean short-term rental revenue of each district, it is obvious that the “Centrum” district has much higher revenue compared to the other districts. However, as for the result of long-term rental prices, each district has similar prices and “Centrum” district has the second lowest price, which contradicts short-term rental prices. We think this may reveal the difference in rent due to different needs of tenants. One might perceive that the central area of the city tends to have more convenient public transportation and denser service facilities, which are factors that short-term travelers care more about. Therefore, they are willing to pay higher prices in the center city. On the other hand, long-term renters are usually people who need to live in the local area for a period of time. For them, a quiet and beautiful area is more attractive than a noisy city center.

In general, if you have an apartment in the center of Amsterdam, it is definitely more profitable to put it on Airbnb as a short-term rental. Even in all districts excerpts “West”, Airbnb can bring higher revenue to hosts than the local long-term rental market.

listings_prices <- airbnb_geo %>%
  dplyr::select(id, price)



prices.in.neighbors <- #spatial join neighobrhood and listing informatoin
  neighbourhoods_geo %>%
  st_join(listings_prices, join = st_intersects)%>%
  dplyr::select(-neighbourhood_group)

neighbors_mean_price <- prices.in.neighbors %>%   #the mean price of eah Neighborhood
  group_by(neighbourhood)%>%
  summarise(mean_price = mean(price))

ggplot() +
  geom_sf(data=neighbors_mean_price, aes(fill = q5(mean_price)))+
  scale_fill_manual(values = palette5blue,
                    labels = qBr(neighbors_mean_price, "mean_price"),
                    name = "mean_price\n(Quintile Breaks)") +
  geom_sf(data=airbnb_geo, color = 'red', size = 0.5)+
  labs(title = "Airbnb Listings in Each Neighborhood", subtitle = "Figure 7-6")

9. Limitation

a. Prices prediction

Overall, our predictions are based on the ideal assumption that hosts no longer have other expenses in addition to paying 3% Airbnb service fee, which means they can have 97% of listing prices as their income. But the reality is that short-term rental landlords often need to deal with additional costs such as damage to the house and its facilities.

Also, the lease of a certain apartment for a long-term lease only includes the rent of the house, and the tenant needs to pay additional fees such as water, electricity, network fees, etc., but the listings for short-term rentals will include these expenses in the rent , which will also result in higher short-term rental income than long-term rental.

b. Available days prediction

We use a column called “available_30” from the original data set as the number of days not rented in the next 30 days of each apartment on Airbnb. Therefore, the number of days that have been rented out can be obtained by subtracting this column of data from 30. For example, if an apartment has a “available_30” value of 7, we will consider it rented out for 30-7=23 of the next 30 days. But, some apartments have “available_30” of x days because they only open for booking for the next x days. So we may be overestimating the number of days that an apartment can be rented on Airbnb.

c. Comparison

After predicting the prices of the apartments and the number of rental days within 30 days, we multiply the prices and the number of rental days to get the hosts’ short-term rent income in 30 days from Airbnb. And then, calculate short-term prices of each district and compare them with the long-term rental prices of the local rental market. But even in the same district, the prices of different apartments may vary greatly. Which makes our result less useful to those hosts who want to compare prices in a more precise geographical area. We can provide a more precise comparison if we obtain the mean rent prices of the local long-term rental market of a smaller area.